Transcribing radio news
نویسندگان
چکیده
We have recently extended the capabilities of BBN's large vocabulary discrete-utterance speech recognition system (BYBLOS) to operate on raw audio recordings of radio news programming. The recordings are given to the system as large monolithic waveforms without any additional sideinformation. Our goal is to transcribe all speech in the input with the highest accuracy possible. The problem is very challenging because radio news programming has frequent changes in speaker, speaking style, dialect, accent, topic, channel, and environmental conditions. Furthermore, the monolithic input presents new problems for recognition algorithms and language models since all useful boundaries (such as speaker turns or sentence ends) are unknown.
منابع مشابه
Recent advances in transcribing television and radio broadcasts
Transcription of broadcast news shows (radio and television) is a major step in developing automatic tools for indexation and retrieval of the vast amounts of information generated on a daily basis. Broadcast shows are challenging to transcribe as they consist of a continuous data stream with segments of different linguistic and acoustic natures. Transcribing such data requires addressing two m...
متن کاملQuick Rich Transcriptions of Arabic Broadcast News Speech Data
This paper describes the collect and transcription of a large set of Arabic broadcast news speech data. A total of more than 2000 hours of data was transcribed. The transcription factor for transcribing the broadcast news data has been reduced using a method such as Quick Rich Transcription (QRTR) as well as reducing the number of quality controls performed on the data. The data was collected f...
متن کاملDiscriminative rescoring based on minimization of word errors for transcribing broadcast news
This paper describes a novel method of rescoring that reflects tendencies of errors in word hypotheses in speech recognition for transcribing broadcast news, including ill-trained spontaneous speech. The proposed rescoring assigns penalties to sentence hypotheses according to the recognition error tendencies in the training lattices themselves using a set of weighting factors for feature functi...
متن کاملTranscribing broadcast news with the 1997 Abbot System
Recent DARPA CSR evaluations have focused on the transcription of broadcast news from both television and radio programmes [17]. This is a challenging task because the data includes a variety of speaking styles and channel conditions. This paper describes the development of a connectionist-hidden Markov model (HMM) system, and the enhancements designed to improve performance on broadcast news d...
متن کاملTranscription of broadcast news
In this paper we report on our recent work in transcribing broadcast news shows. Radio and television broadcasts contain signal segments of various linguistic and acoustic natures. The shows contain both prepared and spontaneous speech. The signal may be studio quality or have been transmitted over a telephone or other noisy channel (ie., corrupted by additive noise and nonlinear distorsions), ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996